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Abstract 

We develop a quadratic regularization approach for the solution of high-dimensional multi¬ 
stage stochastic optimization problems characterized by a potentially large number of time peri¬ 
ods/stages (e.g. hundreds), a high-dimensional resource state variable, and a Markov information 
process. The resulting algorithms are shown to converge to an optimal policy after a finite num¬ 
ber of iterations under mild technical assumptions. Computational experiments are conducted 
using the setting of optimizing energy storage over a large transmission grid, which motivates 
both the spatial and temporal dimensions of our problem. Our numerical results indicate that the 
proposed methods exhibit significantly faster convergence than their classical counterparts, with 
greater gains observed for higher-dimensional problems. 


1 Introduction 


Multistage stochastic problems arise in a wide variety of real-world applications in fields as diverse 
as energy, finance, transportation and others. In this paper we consider multistage stochastic linear 
programs that satisfy the following conditions: i) the time horizon length T is finite but potentially 
large (there may be hundreds of time periods and stages); ii) for each time period, the set of sample 
realizations of the exogenous information process is finite (and relatively small); Hi) for each stage, 
the stage cost is a linear function of the decision. 

Pereira and Pinto in introduced a powerful algorithmic strategy known as Stochastic Dual Dy¬ 
namic Programming (SDDP) that has received considerable attention for this problem class. Despite 
its popularity, SDDP can exhibit slow convergence, especially in the setting of high-dimensional re¬ 
source allocation problems. A separate but important challenge arises when handling problems with 
long horizons which introduces algorithmic issues for both the setting of intertemporal indepen¬ 
dence, as well as when there is Markov dependence. Not surprisingly, as practical problems grow 
in size, improving the rate of convergence of SDDP-type methods becomes an issue of growing 
importance. 

Quadratic regularization has been among the most effective techniques for accelerating the con¬ 


vergence of scenario tree-based decomposition methods (see work by Ruszczyhski 1126112711291] ). 
However, its application to the SDDP framework has not been possible because of the exponential 
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growth of the number of required incumbent solutions. In this work, we propose a new regulariza¬ 
tion approach which overcomes that challenge. The method can lead to much faster convergence 
by reducing the oscillation of solutions around distant vertices of the feasible regions where the 
addition of new cutting hyperplanes might be of little value. 

This paper makes the following contributions: i) We adopt notation that bridges the gap be¬ 
tween dynamic programming and classical stochastic programming, which lays the foundation of 
our algorithmic strategy by identifying and clarifying the role of the post-decision information 
state; ii) We develop the first quadratic regularization method for the SDDP framework, with or 
without Markov dependence in the information process, that produces an optimal policy for a sam¬ 
pled model; Hi) Unlike existing regularization methods on scenario trees, our approach remains 
computationally tractable even for problems that involve long time horizons; iv) Our numerical re¬ 
sults indicate that the proposed approach exhibits faster convergence than classical SDDP and is 
especially useful for problems with high-dimensional resource states. That makes the work relevant 
to a wide variety of practical applications. 

Our numerical work uses the setting of optimizing energy over a fleet of storage devices for a 
congested transmission grid. This problem class offers a realistic setting for testing the algorithm 
with anywhere from 50 to 500 batteries, allowing us to test the performance of the algorithm for 
resource state variables with widely varying dimensionality. A separate challenge is that these prob¬ 
lems exhibit a large number of time periods; our experiments model a day in 5-minute increments, 
producing problems with 288 time periods. 


2 Literature Review 


The decomposition approach of Benders ||3l and the L-shaped method of Van Slyke and Wets 
originally focused on the solution of two-stage stochastic optimization problems. Eventually, the 
idea was extended to the multi-period setting by Birge ||3], as well as Donohue and Birge |@] who 
considered successive Benders-type approximations of the recourse functions in the nested Benders 
decomposition algorithm for multistage problems on scenario trees. Pereira and Pinto in further 
extended the approach to develop Stochastic Dual Dynamic Programming which has become popu¬ 
lar among practitioners. On one hand, the method provides both lower and upper bounds, as well as 
clear convergence guarantees for many of its different versions as has been discussed Shapiro 1321] . 
as well as Linowsky and Philpott Q. Moreover, it is also very suitable for parallel computing and 
can be applied to problems with long time horizons. Despite its progress towards overcoming the 
curse of dimensionality, in its essence SDDP is a cutting plane method, a class of algorithms known 
to exhibit slow convergence (see a behavior that is a byproduct of the well-known curse of 

dimensionality. In general, their computational complexity grows exponentially with the dimension 
of the problem. In the special case of only two time periods, the SDDP algorithm is equivalent 


to the well known cutting plane method of Kelley Ill2|] which takes O 


ln( 


itera- 


21n2 LysJ 

tions to achieve an e-optimal solution on an n-dimensional problem as pointed out by Nesterov and 
Nesterov S. 

Rockafellar El introduced the proximal point algorithm for the minimization of (deterministic) 
lower semicontinuous proper convex functions. The quadratic regularization of two-stage linear 


stochastic optimization problems was developed by Ruszczyriski 1125112911 . The same idea has also 


been implemented in the two-stage and multistage versions of the Stochastic Decornposition method 
developed by Higle and Sen 11(113 ill , as well as the decomposition work of Morton OITH . All of these 
methods utilize a scenario tree, either explicitly or implicitly (when indexing regularization terms 
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by the entire history Ht), and consider separate incumbent solutions for every parent node in the 
scenario tree. That limits their applicability to problems with short time horizons. On the other 
hand, the method described below is universal and can be applied to problems with a large number 
of time periods. 


3 Problem Formulation 

Given a probability space {Q,T,P) with a sigma-algebra P, and a filtration {0,G} = C 
J-2 C ■ ■ ■ C Ft = we consider a stochastic process {Wt, t = 1,... ,T} adapted to {Pt, t = 
1,... ,T}. Throughout our presentation, we adopt the convention that any variable indexed by t is 
-measurable (surprisingly, this is not a standard assumption). Our goal is to develop new solution 
methods for the following multistage linear stochastic programming problem: 


min(co, xo) -f Ei 

A^XQ—hQ 

min(ci, Xi) + E 2 

Boaio+^ia^i=^>i 


min {ct,xt) 

Bt ~ 1 — i -\-Atii^t—^t 


Xq>^ 

xi >0 

_ 


J _ 


The components of the information process Wt = {At, Bt,bt,ct),t = 1, ... ,T are the -measurable 
random matrices At, Bt and vectors bt,Ct, while Aq , i?o 7 , cq are assumed to be deterministic com¬ 

ponents of the initial state of the system S'o = (Aq, Bq, (>07 cq)- We denote the sets of possible re¬ 
alizations of Wt with rit, t = 1,... ,T. Those correspond to nested partitions of Q, given by the 
filtration {Ft, t = 1,... ,T}, and each w G ft can be represented as w = (a;i,W 27 ... ,u!t) G 
fli X fl 2 X ■ ■ ■ X fir. We assume that each sample set fit has a finite number of elements that is 
small enough to be enumerated computationally. 

Definition 1. The information history at time t is Ht = {S'o, wi, ^27 ..., w*}, where Ht G TLt = 
{^0} X rii X fl 2 X, - ■ ■ X fit- Further, we define the post-decision information history at time t to 
be Hf = {So,xt),uJi,xi,uj 2 ,X 2 ,... ,ujt,xt}. 

Employing a dynamic programming framework, we distinguish between two types of states of 
the system, the pre-decision states St and the post-decision states Sf. 

Definition 2. The (pre-decision) state St of the system at time t > 1 is all the information in 
Hf_i U (jJt that is necessary and sufficient to make a decision at time t, and model the impact of 
Hf_i U LOt on the computation of costs, constraints and transitions from time t onward. 

Furthermore, the pre-decision state of the system St can be represented as St = {Rt, It), where 
the pre-decision resource state Rt is the amount of resources available at the beginning of time 
period t, and R is the pre-decision information state. Please note that Rt depends on both the 
decision Xt-i and the random vector bt. 


Rt — Bt-iXt-i — bt. 


The information state R contains all the remaining information of St that is necessary and sufficient 
to model the system but is not in Rt. Formally, we consider the following model for the evolution 
of the system over time: 


-^X ^2 
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Definition 3. The post—decision state ,t > 0 of the system at time t is all the information in 
the post—decision history iJf that is necessary and sufficient to model the impact of Hf on the 
computation of costs, constraints and transitions from time t onward, after a decision has been 
made. 

We also represent the post-decision state of the system as Sf = , If). The post-decision 

resource state Rf is given by 

Rt = Btxu 

and the post-decision information state If represents all the information in Sf that is not in Rf. 
Moreover, we refer to the rank of the matrix Bt as the dimension of the post-decision resource state. 
If we dehne 

C{St,Xt) := {ct,Xt) 

and the set Xt{St) is such that the following conditions are satisfied, 

■y (O \ ^ f Xt eU"* : AtXt =bt, if f = 0 

xt€ R"* : Bt_ixt-i + AtXt =bu iff > 0 

then we can rewrite problem ([l]) using dynamic programming notation as follows. 


min C'(5'o, xtf) + Ei 
a:o€A:'o(<So) 


min Xi) + E 2 

a:ieA4(Si) 


• • • + Et 



( 2 ) 


Since the problem is stochastic, its optimal solution is not a vector but rather a policy tt, which is 
a function that maps states St to decisions xt G Xt{St). Thus, we can consider the optimization 
problem ((J]) to be a search for an optimal policy tt* over the set 77 consisting of all feasible and 
implementable policies 


min E 

ttGIT 


Y,C{St,Xf{St)) 


.t=0 


(3) 


We refer to equation ([3]l as the base model, and we can solve it by constructing an optimal looka¬ 
head policy. In that case, the optimal decisions Xf{St) corresponding to tt* satisfy the following 
optimality equation: 


Xf{St) G argmin 

xt^XtlSt) 


C{St 


Xt) + min E 

Ti-gn 


T 


Y, CiSt^,XfiSt^)) 

t'=t+l 



(4) 


Therefore, we can also specify an optimal lookahead policy tt* by employing its corresponding 
post-decision value functions Vf{Sf), 


VfiSf) = minE 


T 


^ C{St>,Xf{St~)) 

t'^t+1 



(5) 


Remark 1. It is common for practitioners to employ a scenario tree in order to construct an ap¬ 
proximate lookahead policy for problem m as follows: 


Xf{St) G argmin 

xt€Xt{St) 


C{St,xfj -f minE 

Treir 


^ C{St,,Xf,{St,)) 

t'=t+l 


Si 


(6) 
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When t" < T the optimality of the approximate lookahead policy given by cannot be estab¬ 
lished. In addition, lower and upper bounds to the optimal value of problem m might not be readily 
available (due to approximation errors stemming from the stage reduction). Nonetheless, lookahead 
models can produce high-quality solutions in selected problems (see Ml)- 

Thus, at any time period t — 0,... ,T, the optimal decision X^{St) for problem (l2| can be 
computed as 

XUSt) G argmin{C(5t,Xt) + 

xteXt(St) 

Hence, the search for an optimal policy tt* is equivalent to the computation of optimal post-decision 
value functions Vf{Sf), t = 0,... ,T. One of the well-known methods that allows us to construct 
such value functions is Stochastic Dual Dynamic Programming. 


4 Stochastic Dual Dynamic Programming 

Typically, stochastic programming techniques model the flow of information by utilizing a scenario 
tree that involves the entire set Ht = {S'o} x Di x • • • x Dt. While such an approach is useful for 
analytical purposes, its practical applicability is limited by the exponential growth of the number 
of nodes in the scenario tree when the length of the time horizon T increases. To overcome that 
challenge, Pereira and Pinto El introduced the Stochastic Dual Dynamic Programming (SDDP) 
method for the solution of multistage stochastic linear optimization problems over long time hori¬ 
zons. SDDP overcomes the combinatorial explosion of the information state by exploiting (a key 
and limiting assumption of) stagewise independence as ¥{uJt+i\Ht) = P(a;t+i), and therefore all 
post-decision states Sf share a single information state If. Hence, Sf = Rf and the optimal value 
functions Vf(Sf) only depend on the post-decision resource states Rf, 

v;(sf) = v;(Rf),t = o,...,T. 

The convexity of the optimal value functions Vf {Rf) is the key property that allows one to partially 

escape the curse of dimensionality arising from partitioning the resource space. Instead, Vf{Rf) 

_^ 

can be approximated with lower-bounding convex outer approximations (Rf) whose functional 
form is the maximum over a collection of affine functions. Those are commonly known as cutting 
hyperplanes or Benders cuts, and are constructed at the resource points Rf’^ that are visited during 
the j-th forward pass, 

v\{Rf) := max {ai + (Pi,Rf - (7) 

]<k 

For example, at iteration k we would obtain Rf'^ by solving the following linear program 

xf G argmin {c{St,Xt)-i-v’l , and setting ^ Sfxf (8) 

where = 0. 

_/e—1 

The approximations ) are updated in the backward pass of iteration k by constructing 

a cutting hyperplane h^(Rf) to the optimal value function Vf{Rf). To accomplish this, we use a 
lower bound [R^'^) (derived from solutions to subproblems for time t -|- 1) to Vf {Rf'^), 

h^R^) := + iPt, Rt - RT'^)- ( 9 ) 

Please note that the hyperplane h^(Rt) is not necessarily tangent to Vf{Rf) since 
yA+iiR-t'^) might be strictly smaller than Vf{Rf'^). 
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Remark 2. When we need to emphasize the dependence of the feasible set Xt+i (S't+i) on the previ¬ 
ous post—decision state R^, we use the notation Xt+i{R^, It+i), where the exogenous information 
in Rt+i that is not contained in Rf is assumed to be contained in h+i- 

In order to construct we consider every element of the sample set cot+i G flt+i and denote 
with (i?f, uit+i) the optimal value of the following optimization problem, 




min 

(-Rf (iOt+i)) 


{ciSt+iiuJt+i),Xt+,) + ■ 


Finally, we set 

v'^+ARt)--= E p(wt+i)Zt"+i(i?r,wt+i). 

cctt+iGnt-i-i 


Hence, if we choose 

e dRVl,iR^’% 

then we can construct a new aggregated cut h^(R^) as described in equation Thus, in the 
backward pass of iteration k, we can update the approximate value function Vf (i?“) as follows, 




If none of the constructed cuts are removed, then the growing collections of affine functions 
generate sequences of monotonically increasing lower bounding approximations (i?f) to the 
optimal post-decision value functions Vf{R^) for any f = 0,..., T — 1. 

< V^Rt) < v:{R^)yk e N,f = 0,... ,T- 1. 


Furthermore, in this work we assume relatively complete recourse, i.e. for any feasible solutions 
to the optimization problems at time periods t = 0,...,T — 1, there exists a feasible solution to 
any realized stage t + 1 subproblem with probability one. This assumption alleviates the need for 
feasibility cuts and allows us to improve the clarity of the presentation. 


5 Quadratic Regularization 

Existing regularization approaches SB IB El [II utilize a scenario tree and consider separate 
incumbent solutions Xt{Ht) for every possible history Ht G RtR = 0,... ,T — 1. The underlying 
idea in such methods has been to augment optimization problems of the form ® with a regulariza¬ 
tion term as follows, 

G argmin (c{St,xt) + Vt^{Rt) + ^\\xt-Xt{Ht)\\l]. (10) 

xteXtiSt) '' ^ 

As the algorithm progresses, each incumbent solution xt{Ht) is updated to a new optimal solu¬ 
tion, if certain conditions are satisfied. While such an approach is feasible for problems on scenario 
trees with small T, it is not practical for non-trivial time horizons. The exponential growth of the 
scenario tree ensures that only a tiny fraction of all possible realizations Ht GRt, t = 0,... ,T — 1 
could be examined in the forward pass in a reasonable computational time. Moreover, multiple visits 
to each Ht G Rt, t = 0,... ,T — 1 and multiple updates of its incumbent solution are also out of 
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the realm of computational feasibility for most practical instances. One way to remedy this diffi¬ 
culty would be for different histories to share incumbent solutions. For example, a single incumbent 
solution xt can be shared among all realizations Ht G TLt and that would result in the optimization 
problem 

xf e argmin \c{St,Xt) + Vt ^ {Rf) + ^\\xt - Xt\\l] . (11) 

xt&XtiSt) ^ 


Equation (fTTI) can be used in place of equation (l8]l, and it would still result in a convergent 
method for a fixed set of incumbent solutions xt,t = 0,... ,T — 1. However, the optimality of the 
resulting policy cannot be established. Moreover, since the purpose of the quadratic regularization 
term is to mitigate the inaccuracy of the value function approximations, we do not need to regularize 
around the entire vector xt (which might be very high-dimensional) but only around the parameters 

_fe — l 

of the post-decision value function approximations {Rt)- Thus, we can adjust problem 
(fTTTl to address these concerns by making the following adjustments, 




€ arg mm 

xt&XtiSt) 


C{St,xt) + v1 \Rf\ 


0^ 

2 


- R 


k 1 , j. 


( 12 ) 


where the sequence of penalty coefficients {g^} is such that g^ > 0, Vfc G N and lim g^ = 0. 

k—¥oo 

We also introduce a positive semi-definite matrix Qt ^ 0, which can be used to address any scaling 
concerns across different entries of the resource vectors Rf. Please note that the meaning of the 
proposed regularization strategy is quite different from its scenario tree counterparts, as it aims to 
steer the solution towards a “known” region of the value function domain, rather than to the “cor¬ 
rect” solution for the given history Ht of the stochastic process. Hence, we choose the incumbent 
solutions to be the previous points encountered in the forward pass since the cuts supported at those 
points are the ones generated with the most information. Finally, we also point out that unlike the 
case of scenario trees, in the current method we do not aim for the convergence of the incumbent 
solutions towards any point. Interested readers are free to choose different incumbent solutions that 
they might find appropriate, and the convergence results presented below would still hold. 

Now, we can substitute equation (IT2l) for equation ((S]) in the forward pass of SDDP, and the new 
procedure would still converge to an optimal solution of problem (O with probability one after a 
finite number of iterations. That might appear somewhat surprising since gradient methods applied 
to quadratic optimization problems typically entail asymptotic convergence. However, in this case 
a finite number of iterations is sufficient since the true problem remains linear, and the quadratic 
terms are only used to guide the exploration phase of the forward pass. Moreover, the generation 
of the supporting hyperplanes in the backward pass utilizes linear programming problems which 
can generate only a finite number of different cuts when basic dual feasible solutions are used. The 
details of the resulting method are presented in Algorithm[Tl and we study its convergence properties 
below. 


Lemma 4 (J^ Q). Suppose that dual basic solutions are used in the solution of subproblems 
in the backward pass of Algorithm\I\ Then, there exist a finite number of possible value function 
approximations f = 0,..., T. 


Since the regularization terms are artificial for fhe original problem, we exclude them from the 
definition of an optimal policy. 

_^ 

Definition 5. The value function approximations V tfi = 0,... ,T are optimal for problem (O if 
for any realization w G fl, 


min + 1/, (Rf)} = min {C(5t(cc),x*) + VfiR^)} (13) 

xt&Xt(St{i^)) xt&Xt{St(uj)) 
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Algorithm 1 Quadratic Regularization Method with Stagewise Independence 


1: Choose Qt ^ 0, t = 0,..., T, and define sequence 
2: Define Fy(_Rf,) := k = 0,..., K. 

3: Define (Rt) •= —oo, t = 0,... ,T — 1. 

4: , /o) ^ So, k = 0,...,K 

5: for A: = 0,..., -fC do 
6: Forward Pass: 

7: Sample te S 

8: for f = 0,..., T do 

9: if (fc = 0) then 


Select Xf S argmin {C(St(a;), xt)} 




11: else 


if t < T then 


x?e argmin { C{StH, xt) + \r-) + S^^/r- - R^’^ ) 


Select E arg min 




end if 
end if 

Seti?^’'' ^S'=x*;5t+iH ^ (i?"- 6t+iH,/t+iM) 

end for 

Backward Pass: 

for t = T,.. ., 1 do 


Define := min |c(St(a;*),^t) + V?(i?f )| 


for all u)t € do 


SelectG diix_^V^{RlX‘^t) 

25: end for 

26: E (R?’", cat);^ ^ 

27: := 

28: Ftl(«f-l) := max {v'lZl {Rl-l) ^ 

29: end for 


inin C(So,xo) + yo(«o) 
a:oGA’o(5o) 


31: Rt’'" <-t = 0,... ,T - 1 

32: end for 







fort = 0,...,T. 

Theorem 6. Suppose that Algorithm\I\satisfies the following assumptions: 

1. Vt{-) = ken. 

2. Dual basic optimal solutions are used in the backward pass. 

3. Every element uj e kl has a strictly positive probability P(a;) > 0. 

4. >Q and lim = 0. 

k—fcio 

5. The feasible sets Xt (St) are bounded for each t = 0,... ,T. 

Then, the regularization method presented in Algorithm\T\converges to an optimal policy of problem 
0 after a finite number of iterations with probability one. 

Proof. Proof; Let Vt denote the set of all possible value function approximations 

t = 0,... ,T that can be generated by the backward pass of Algorithm[T] Since according to 
Assumption|2]we use only dual basic optimal solutions in the backward pass, by Lemma|4]we know 

that the sets Vt have finite cardinality for alH = 0,... ,T. Thus, we know that as the algorithm 

_^ 

progresses all the value function approximations Vt will eventually stabilize. Therefore, there exists 

_^ 

an iteration index fci G N after which no updates can be made to the value functions Vt,t = 
0,... ,T for fc > fci. If the value functions Vt^, t = 0,... ,T are optimal for problem (|2]l, then we 
are done. 

Now, suppose that was not the case. Then there exists t', 0 < t' < T, and cu' G fl such that for 
any fc > /ci we have 

min {CiSt'iuj'),Xtfi + Vt1iR^,iu:'))} >mm {C{St'{uj'),Xtfi+ Vt~\Rt'i^' 

X,,{s,, (L')) GX,, {s,, (L')) 


Let us consider the set 


A=\Se 


6=^m {CiStiiofxt) + VfiRUc^))} - mm {C{St{u:fxt) + VtiRU^))} : 

xtGXt{St(i>)) xtGXtiStiiS)) 

{CiStiu;),Xt) + Vt*iRnu^))} > min x*) + F*, 

xtGXtiStioj)) xt(^Xt{St{iS)) 


(14) 


where w G fl, and Vt G Vt, t = 0. 




Since the number of elements w G fl is finite, we know that the set A also has a finite number of 
elements. Thus, Z\ has a minimum element, and we denote 


e = min A. 


Hence, 


min {C'(5't/(w'),a:*/) + - min {C{St'{u;'),Xt') 

,&x,, (s,, (L')) x\, ex,, {s,, (L')) 


-Vtr\R^,{u:'))}>e 
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And if i' > 0 we know that 




tj(/ eQfi 


x,,eX{R^,_^,h{u;,,)) 


and using convexity, 

-k-l 


V,,_,iRR-i)< E ,ACiSt,icut,),xt,) + V,, (Rt,)}. 






Therefore, 


min {C{St^_,{uj'),xt^.,) + V;_,iR^,_,{uj'))} 

Xtf _l{S^/_i{uj )) 

> min {CiSt,.^iu;'),Xt>-^) + vA-\iRt'-li^'))}, 


(15) 


which implies 

min {CiSt,.^iu;'),Xt,.^) + V;_,iR^,_,iuJ'))} 

{CiSt>-iiu}'),xt>-i)+ vA-\iRR-ii^'))} 

Proceeding by backward induction, we know that 


min {C(.So,^o) +V(^S)}- min {^(^o, xq) + v" '(-Rq)} > e- 

a:o€A:’o(S'o) xq^?(^q{Sq) 

Moreover, using Assumption |5] we know that Rf is bounded for each t. Hence, without loss of 
generality we can assume that k is such that 


■k-l 


k-l. 


g^{R^-Rt’ ,QtiRt-Rt’ )) < e, for f = 0,... ,T - 1. 


Hence, if we denote with Xq the solution to the following regularized problem, 
x^o = argmin {C(^o,a;o) +F^'(i?g(cu')) + Qo(i?§(cu') - ^o’""'))} 

xoGXq(So) 

then we know that 


k 


rnin {C(5o,xo) + Ho*(i?5)} > CiSo,x^o) + (^o’ ) + - ^o’ ^QoiR^ - 

aio^'^o(5o) 


And since Q is positive semi-definite, we know that 

inin {C(5o,xo) + Ho*(^S)} > C{So,x'^)+ vA\R^oA 

xqG^o{Sq) 


which implies, 


and therefore 


CiSo,x'^) + V*{rA) > C(So,x^o) + Vo~ (RA)- 


Vo*(rsA>vA'(rA)- 
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Thus we know that the value function approximation Vq (•) is suboptimal at the point i?g’ cor¬ 
responding to Xq. Hence, if the value function is such that for each wi G Hi the following 

holds, 

min{C(5Ha;i),Xi)+Fj-\i??)} =min{C(5i(a;i),Xi) + Fi*(i??)} 

then the backward pass will result in an updated value function yo(’) that Vq{Rq’'^) = 
Vq{Rq’'^) > Vq which is a contradiction with the choice of k. Therefore, it must be 

the case that there exists w" G Hi such that 

min{C(^iK),a:i) +Fi~^(i?^)} < min {C{Si{u;'l),xi) + V*{R^,)}. 

xiGX(Rg''‘ ,Ii(ui")) xiGXiRg''^ ,Ii(ui")) 


Moreover, 

min {C{St{u:),xt) + Vt~\Rt)] = min ^ xt) + 

Therefore, there exists a sample path w" G H and a time index t", 0 < t" < T such that the 
sequence of regularized solutions xUu^n would result in a suboptimal value function evaluation at 
t", 

mm{C{St.Roj"),Xt'') + v'l~\Rl.)) < mm{C{StsRu^),Xt») + V:„{Rl.)), 

x,„GX{R^^,f_^,I,„{u:")) x,„GX(R‘l;j^_^,I,„{u:")) 

and optimal evaluations at t" + 1 for all possible ujt"+i G Ht//_|_i, 


{CiSe>+iiu:t"+i),xe>+i) + 4,+i(it-?=.+i)} 

{C'(S't"+i(a;t//+i), Xt"+i) + . 




min 

+i(^t"+i)) 


Hence the backward pass of iteration k will result in an updated value function approximation V^,, (•) 
such that 

v':„{Rph = v;,iRf,’^) > 

which is a contradiction with the choice of k. This completes the proof. 
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Forward Pass 


Backward Pass 


o 
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Figure 1: Value function update at iteration k. 
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6 Beyond Stagewise Independence 

Despite its advantages, the SDDP methodology has one crucial drawback. The stagewise inde¬ 
pendence of Wt = {At, Bt,bt,ct) will generally not hold in practice since real-world multistage 
problems often involve stochastic processes that exhibit some degree of temporal dependence. There 
are different approaches that we can adopt to address this difficulty. First, let us consider the special 
case when the history dependence occurs only in the right hand side constraint vectors bt, and it has 
the following autoregressive structure: 

t-i 

bt = '^ i^t,t'bt' + + m (17) 

t'=i 

where the process (At, Bt,Ct,r]t) is stagewise independent and the deterministic matrices and 
T't contain the autoregressive information. Then, for each time period f > 0 in the SDDP formu¬ 
lation, we can extend the original optimization problem with additional variables to accommodate 
the realizations of bt' and rf ,t' < t that are necessary to model the autoregressive dependence 
(see Sin, and Q)- The advantage of such a solution to the history dependence problem is that 
stagewise independence is present in the extended formulation. A drawback of the approach is that 
the dimension of the state space also increases from \Rf\ (in the stagewise independence case) to 
possibly as much as \Rf\ + \ + \Vt'\) (in the history dependent case), which implies a 

slower convergence rate (note that we can omit terms \bt' \ if $t,t' = 0 , and |r?t'JJf T't.t' = 0 ). This 
problem can be alleviated with the use of cut sharing strategies as described in llllll . and iQ]. 

In the remainder of this section we consider an alternative setup that leads to an increase in the 
information dimension rather than the resource dimension. We assume that the stochastic process 
Wt is a discrete state Markov chain. Thus, the probability of occurrence of oJt+i G depends 
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only on the current realization wt S or the current post-decision information state If, 


n^t+i\Ht) 


P{uJt+i\St)=F{iOt+i\If), iff = 0 

¥{ujt+i\ujt) = if f > 0. 


( 18 ) 


Such an approach can be suitable for problems where the process {At, Bt,ct) is not stagewise inde¬ 
pendent, or the autoregressive model (fTTI) does not constitute a good fit to the observed realizations 
of the random process. For example, historical weather data might indicate the presence of distinct 
patterns that cannot be explained with a normal error distribution around a given mean (which arise 
in autoregressive estimation). Alternatively, the relevant information state could be the forecast of 
the highest temperature tomorrow. 

To properly model such weather dynamics one might need to consider different weather regimes 
that are inherently distinct. Thus, multiple approximations of the value functions need to be em¬ 
ployed, which increases the size of the optimization problem. That leads to greater computational 
requirements for solving the problem as a distinct recourse function approximation needs to be con¬ 
structed for every If G If, where If denotes the set of all possible post-decision information states 
at time t. Hence, we need to maintain |If (Ht)| sets of cuts for each time period t = 0,... ,T, and 
therefore the approach is suitable for problems where the cardinality of the possible post-decision 
information states \Xf{Vlf)\ is small, or alternatively when the cardinality of the sample sets | is 
small. However, unlike the case of an autoregressive ht ( fT7b . the dimension of the post-decision 
resource state is preserved in each set of cuts, and the corresponding exponential increase in the 
computational time is avoided. 

In the forward pass at iteration k, we consider a sample path uj = {uji,..., lot) that is generated 
using (HI. At each time step t = 0,..., T— 1 the piecewise-linear value function ^ (i?f, If {to)) 
is used to approximate the optimal value function Vf (i?f, If{Lo)) at the current post-decision infor¬ 
mation state If {lo). 

In the backward pass of the algorithm at iteration k, we consider t = T,... ,1 and generate the 
cutting hyperplanes h^_^{Rf_^, If_^) for each /f_j^ G If_i. Please note that if the random process 
Wt is a finite state Markov chain, then |If (Hf)| < |H(|, f = 0,..., T. We employ the conditional 
probabilities P{ijJt\If_i) to construct constant intercepts and slopes. 




and. 


Thus, 




hUiRUJU) ■■= + {Pt,{it-i),RU - RtL\). 


Hence, we can construct the new value function approximation Vt-i{Rf-i-, If-i) for the post¬ 
decision information state If_i as. 


rrrk 


vU{RU,It-i) ■■= max{yti hU{RU,It-i))- 

The description of the method is given in Algorithm|2l 


( 19 ) 


Theorem 7. Suppose that { Wt ,t = 1,... ,T} is a discrete Markov process as described by equation 

( l7iSI ). IfVTi',!^) = h^('j -^t)’ fa’’ ^ k = 0,..., K, and conditions 2, 3, 4, and 5 specified 
in Theorem^are satisfied, then the method presented in Algorithm\2\converges to an optimal policy 
of problem (O after a finite number of iterations with probability one. 
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Algorithm 2 Quadratic Regularization Method for Markov Models 


Choose Qt ^ 0, t = 0,..., T, and define the sequence 
Define := (Rf,,/f,), k = 0,...,K, /f, G Xf.. 

Define (-Rj , '■= —oo, cjt € Ot, t = 0,..., T — 1. 

(R!l’l-^o) ^-So, k = 0,...,K 

for fc = 0,.. ., R do 

Sample te G ft using the Markov stochastic process {Wt,t = 1,..., T}. 
for f = 0,.. ., T do 
if {k = 0) then 


Select Xf G arg min 




{C(Stiuj),xt)} 


10 : 

11 : 


else 

if t < T then 


12 : 


4 e 


arg mm 

^x,k 
4-1 ’ 




13: 

14: 


else 


4 G 


arg min 

xteXt(R^L\,it(<^)) 


15 

16 

17 

18 
19 

20 : 


end if 
end if 
Set r4 
end for 

for f = T,..., 1 do 




Define yJ=(Rf_l,aJt) := 


xt€Xt(R<l_^,It(u,t)) 


{c(St(a;t),xt) + y?(R?,/f(att))} 


21 : 

22 : 

23 

24 

25 

26: 

27 

28 
29 


for all ut G fit do 

Select^J=(a;t) G 9H*_^k*=(R^i'\,a;t) 

end for 

for all G Xf_i(ftt-i) do 

/rti(R?_i, /f_i) := + {4-l{ir-l), Rf -1 - K'-l) 

:=max{yt4i(Rf_l,/f_i), 4_i{Rf-i, If-i)} 

end for 
end for 


30: 


yg ^ / min C(So, xo) + yg (Rg,/?) 
—u 1^ xoGA:’o(‘5o) 


31: Rg’" 

32: end for 


Rg 


t = 0,... ,T- 1 
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Proof. Proof; The proof is analogous to the proof of Theorem |6l The main difference is that for 

_ 

each time period f = 1,..., T we need to consider \ If \ different value functions (i?f, If). Since 
\If I is finite, the argument of the proof of Theorem[6]can be extended to show that with probability 


1, there exists a large enough k G N such that the value functions If ) 

all wen, and If G If = 0,... ,T. 


are optimal for 

□ 


Remark 3. Various optimization methods for Markov models have been studied in the literature 
for both the risk—neutral (see [2^ ^ ~3^1 ) and risk-averse cases (see and the refer¬ 

ences within). An extensive treatment of optimization problems with Markov uncertainty is beyond 
the scope of this work. The goal of our presentation is the introduction of regularization into the 
Markovian setting, so that it can be adapted to other problems on a case-by-case basis. 


1 Algorithmic Tuning 

In order to turn mathematical arguments into useful numerical results one needs to employ a high 
quality implementation and suitable parameter tuning. In this section we present some of the poten¬ 
tial issues regarding the reliability and computational performance of the methods presented above. 
We consider the construction of regularization sequences, and discuss numerical concerns regarding 
the solutions of subproblems. 


7.1 Regularization Coefficients 


In general, we cannot find a regularization sequence that would lead to the fastest possible con¬ 
vergence. However, if we consider sequences that are defined by a set of parameters, then we can 
attempt to find suitable parameter values. For example, we can construct regularization sequences 
> 0 such that lim = 0 by using the following geometric sequence. Given p° > 0 and 

k—¥CO 


r G (0,1), we define 


p'= = pV=r-p'=-\iffc>0. 


( 20 ) 


In this case, we need to tune the parameters p° and r. We can gain insight by solving a small 
instance of the given problem for different pairs (p°, r). For example, in section [8] we describe an 
optimization model to be solved for high-dimensional post-decision resource states \Rf\ > 50. 
As a pre-processing step, we can solve a smaller instance, e.g. \Rf\ — 25, for each (p°,r) G 
{1,10,100} X {0.9,0.95,0.99}, and compare the results. Since estimates of the upper bounds and 
optimality gaps are stochastic, we prefer to compare only the deterministic lower bounds as they 
are more reliable. The resulting plots can be found in Figure |2l We can see that the sequences 
of regularization coefficients has an impact on the behavior of the proposed methods. However, 
various choices of (p°, r) can be used with similar success. In our experiments in section[8] we use 
pO = 1, r = 0.95. 
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Markov Uncertainty for \R^ \ = 25 



Iteration Iteration 

- = r = 0.9 --- r = 0.95 . pO = 1, r = 0.99 

- gO = 10, r = 0.9 - - - 0 ° = 10, r = 0.95 . ^,0 = 10, r = 0.99 

■- Q° = 100, r = 0.9 - - - 0 ° = 100, r = 0.95. = 100, r = 0.99 


Figure 2: Lower bounds to the objective value of a stochastic optimization problem for different 
regularization sequences. 

7.2 Convergence Tolerance for the Solution of Subproblems 

At each step of the forward and backward pass of Algorithm[T]and Algorithmic we use the current 
collection of hyperplanes { «( + {P^, Rf — j < k} and a realization of kUt = {At,Bt,bt,Ct) 

as a part of the input to a convex optimization problems having the following general form, 

min {c,y) + ^{y,Qy) 
s.t. Ay = b 
2 / > 0 

The numerical precision of the solutions to subproblems (| 2 TI) is essential for the correctness of the 
resulting policy for problem ([T]). However, the right-hand side vector b of problem (1211 1 includes 
the vector bt and the constant terms al — Pi R^’^ of the value function approximations given in (O 
or (fT^ . If problem ([il has a long time horizon, then an aggregation of constant terms with large 
modulus \al — Pi R^’^ \ can occur, and that could lead to numerical solutions of problem || 2 T]) which 
do not satisfy the system of constraints Bt-iXt-i + AtXt = bt with a desirable precision. Convex 
optimization tools, including specialized algorithms for linear and quadratic programming problems, 
often use convergence tolerance parameters to guide their stopping conditions. For problems with 
long time horizons, we encounter numerical precision problems that require that we use care in 
setting tolerance parameters for stopping conditions. In the sections below, we discuss the issues of 
relative primal feasibility, and the relative complementarity gap. 
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7.2.1 Relative primal feasibility 

Suppose that a given optimization solver has a feasibility condition of the following form, 


\\^y-b\\ 
i + IIHI - 


( 22 ) 


We can consider two right-hand side vectors 5^,6^ such that ||6^|| < 

12 u 1 11 ^ 2 /^ “('^11 

candidate solutions y,y such that —;— = £/• 


i + m 


1 + 1162 


||62|| and corresponding 


Then the feasibility errors satisfy \\Ay"^ ~ 6^|| < WAy"^ — 6^||. Therefore, if we keep the primal 
feasibility tolerance ej fixed while ||6|| grows, then the feasibility errors \ \Ay — 6|| (and therefore 
\\Bt-iXt-i + AtXt — bt\\) could increase as well. Hence, for problems with a long time horizon or 
a large number of hyperplanes in the value function approximation, one might need to decrease the 
tolerance e/ for problem (12111 in order to bring the size of the error \ \Bt-iXt-i + AtXt — 6* 11 down 
to an acceptable level. 


7.2.2 Relative complementarity gap 

Commercial solvers often include implementations of primal-dual interior point methods (see 
0]) that employ a relative complementarity tolerance Sc in their stopping condition. The presence of 
large (by modulus) constant terms in the right-hand side vector 6 can also lead the numerical solver 
to terminate at an infeasible solution with non-negligibleerrors | \Ay — 6|| and \\Bt-iXt-i + AtXt — 
6t 11, if Ec is not chosen appropriately. We present a brief explanation below. 

The Lagrangian of problem (l2Tl l is given by 

L{y, y, A) = (c, y) + ^(y, Qy) + (y, 6 - Ay) - (A, y) . (23) 

Hence, the Karush-Kuhn-Tucker conditions for problem (l2Tll are given by the system of constraints. 


Ay = b 
AJ y, — Qy + A = c 
FAl = 0 
y, A > 0 


(24) 


where + = diag( 2 /) and A = diag(A). 

Interior point methods construct iterative approximations to the solution of ll24l) using a sequence 
of scalar barrier parameters > 0, such that 0. Assuming that the initial point yP,X^) is 
infeasible for (l24ll and (j/°, A°) > 0, we can have a stopping condition for the complementarity gap 
such as 


(y",A") ^ _ 

(y°,A0) 


or < Ec 


or 


|(c,y") + (y",Qy")| 


< Ec- 


(25) 


At iteration n, the interior point method finds a Newton direction (4\y, Ay, AX) for problem (l24ll 
as the solution to the following system : 


o 

o 


Ay 


b-Ay^ 

-Q / 


Ay 

= 

c - + Qy^ + X^ 

1 

+ 

o 

_1 


. . 


- y”A"l 
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where / = diag(l, 1 ,..., 1 ) denotes the identity matrix. 

Then the current solution (y”, /i", A") can be updated by choosing 7 ” S (0,1] such that 

(y",A”) + 7”(/\y,/\A)>0 (27) 

and setting 

( 2 /"+\ /r-+\ A"+i) = (y", m", A") + Ay., AX). (28) 

Please note that if 7" = 1 , then > 0 would be feasible for problem ||2T]) since = b. 

However, in practice we usually have 7" < 1. Hence, a complementarity tolerance condition (|25]) 
can be met even if the system Bt-iXt-i + AtXt = bt is not satisfied within the desired precision. In 
order to address this concern, in our numerical experiments we set the tolerance Eq to the smallest 
possible value allowed by the solver ( 10 “^^). 

8 Numerical Experiments 

In this section we study the computational performance of the algorithms proposed above. We focus 
our analysis on the following questions. 

• How is the computational performance of Algorithms [T] and |2] affected by: 

- the dimension of the resource vector i?t? 

- the size of the post-decision information state space 1^1 

• How does the performance of Algorithms [T] and |2] compare to their non-regularized counter¬ 
parts? 

Our experimental work was conducted using the setting of optimizing grid level storage for a 
large transmission grid managed by PJM Interconnection. PJM manages grid level storage devices 
from a single location, making it a natural setting for testing our algorithms. As of this writing, grid 
level storage is dropping in price, providing a meaningful setting to evaluate the performance of our 
algorithms for a wide range of storage devices, challenging the ability of the algorithms to handle 
high dimensional applications. For this reason, we conducted tests on networks with 50 to 500 
storage devices. These are much higher dimensional problems than prior research that has focused 
on the management of water reservoirs. 

Another distinguishing feature of our grid storage setting (compared to prior experimental work) 
is that a natural time step is 5 minutes, which is the frequency with which real-time electricity prices 
(known as LMPs, for locational marginal prices) are updated on the PJM grid. We anticipate using 
storage devices to hold energy over horizons of several hours. For this reason, we used a 24 hour 
model, divided into 5-minute increments, for 288 time periods, which is quite large compared to 
many applications using this algorithmic technology. 

A complete description of the given model is beyond the scope of the current paper and can 
be found in |[l|]. Below we briefly describe the construction of the network, and the exogenous 
stochastic process. Finally we present the results of an extensive set of experiments investigating 
the effect of regularization, the number of storage devices (which determines the dimensionality of 
i?t), and the presence of an exogenous post-decision information state, on the rate of convergence 
and solution quality. 
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8.1 The network 


We performed our experiments using an aggregated version of the PJM grid. Instead of the full net¬ 
work with 9,000 buses and 14,000 transmission lines, we limited our analysis to the higher voltage 
lines, producing a grid with 1,360 buses and 1,715 transmission lines. The power generators include 
396 gas turbines (23,309 MW), 50 combined cycle generators (21,248 MW), 264 steam genera¬ 
tors (73,374 MW), 31 nuclear reactors (31, 086 MW), and 84 conventional hydro power generators 
(2,217 MW). Off-shore wind power was simulated for a set of hypothetical wind turbines with a 
combined maximum capacity of 16 GW. Moreover, we consider a daily time horizon with 5-minute 
discretization resulting in a total of 288 time periods. 

The data was prepared by first running a unit-commitment simulator called SMART-ISO that 
determines which generators are on or off at each point in time, given forecasts of wind generated 
from a planned set of off-shore wind farms. We made the assumption that the use of grid level 
storage would not change which generators are on or off at any point in time. However, we simulta¬ 
neously optimize ramping the generators up or down within ranges, while charging and discharging 
of storage devices around the grid in the presence of stochastic injections from the wind farms. 

We placed the distributed storage devices at the points-of-interconnection for wind farms, as 
well as the buses with the highest demand. Each storage device is characterized by its minimum 
and maximum energy capacity, its charging and discharging efficiency, and its variable storage cost. 
The control of multiple storage devices in a distributed energy system is a challenging task that 
depends on a variety of factors such as the location of each device, and the presence of transmission 
line congestion. A good storage algorithm needs to respond to daily variations in supply, demand 
and congestion, taking advantage of opportunities to store energy near generation points (to avoid 
congestion) or near load points (during off-peak periods). It has to balance when and where to store 
and discharge in a stochastic, time-dependent setting, providing a challenging test environment for 
our algorithm. 

8.2 The exogenous information 

Our only source of uncertainty (the exogenous information) was from the injected wind from the 
offshore wind farms. In order to calibrate our stochastic wind error model, we employed historical 
wind data and speed measurements of off-shore wind for the month of January 2010. For each time 
period, we consider a set of ten vectors of possible wind speed realizations which correspond to ten 
different weather regimes. 

In general, the exogenous information process can be characterized by one of the following: 
stagewise independence, compact state variables (Markov processes), or scenario-dependence (path 
dependence). For some instances, the latter case could be reduced to one of the former two by 
applying an appropriate transformation as described in section [6l In our experiments, we consider 
instances with stagewise independent transitions between ten equally likely scenarios. When we 
assumed stagewise independence, we would sample from each of these 10 scenarios with equal 
probability at each time period. For the problems with Markov uncertainty, we assumed that at 
every time period t, the probability of continuing with the same weather regime at time f -f 1 is 
91 percent. Additionally, each of the remaining nine regimes can be visited at time t + 1 with a 
probability of 1 percent. 

8.3 Algorithmic comparisons 

The proposed algorithms were implemented in Java, and the IBM ILOG CPLEX 12.4 solver was 
used for the solution of linear and quadratic convex optimization problems. Further, we performed 
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Wind Power Simulations 



_ Time _ 

- Simulation 1 - Simulation 2 - Simulation 3-Simulation 4 - Simulation 5 

- Simulation 6 - Simulation 7-Simulation 8 Simulation 9 Simulation 10 


Figure 3: Simulated daily realizations of wind power for a given wind farm over 24 hour time 
horizon. 


parameter tuning as described in section|2l We set the relative complementarity tolerance of CPLEX 
to 10“^^, and used a geometric regularization sequence with = 1 and r = 0.95. Additionally, 
we run each method for K = 300 iterations. Moreover, the scaling matrices Qt,t = 0,..., T are 
set to the identity matrix which implies that the amount of energy in each storage device has the 
same weight in the regularization term. In this section we examine the performance of Algorithms 
[T] and [2] when the number of storage devices (dimension of the resource state variable) is |i?f I = 
50,100,200,500. 

Plots of the behavior of the methods can be found in Figures|4l|5][6l[7]below. Each figure shows 
the results for stagewise independence on the left, and Markov uncertainty on the right. These 
graphs show the convergence of the upper and lower bounds, illustrating the dramatic impact of 
regularization, especially as the number of dimensions grow. The results suggest that we consistently 
obtain high quality solutions within approximately 50 iterations for all problems. 
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Value ($) Value ($) 


Stagewise Independence for |i?“| = 50 



^ Markov Uncertainty for \Rf\ = 50 



Iteration 


"igure 4: Numerical comparison of multistage stochastic optimization methods for \Rf\ = 50 


Stagewise Independence for \Rf \ = 100 ^Markov Uncertainty for \Rf\ = 100 



Figure 5: Numerical comparison of multistage stochastic optimization methods for |i?“| = 100 
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^Mai'kov Uncertainty for |i?f| = 200 



Iteration 


Figure 6: Numerical comparison of multistage stochastic optimization methods for |i?“| = 200 


Stagewise Independence for \Rf\ = 500 



^Markov Uncertainty for |i?f| = 500 



Iteration 


Iteration 


Figure 7: Numerical comparison of multistage stochastic optimization methods for |i?“| = 500 


Table [Hand Table [2] show the CPU times (in seconds) per iteration for problems with 50 to 500 
storage devices, with stagewise independence and Markov uncertainty, for up to 300 iterations. We 
note that in a practical application, the algorithms would be run offline (for example, the day before, 
given a particular forecast of wind). The cuts would be stored and then used in real time the next 
day. This would be easily implementable in a policy updated every 5 minutes. 
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Iterations 

1 

50 

100 

150 

200 

250 

300 

50 

Algorithm 1 
SDDP 

m 

217.6 

196.2 

230.8 

201.2 

248.3 

208.9 

266.5 

215.4 

284.3 

223.4 

299.2 

230.4 

100 

Algorithm 1 
SDDP 


306.2 

250.0 

334.3 

262.2 

371.7 

275.1 

412.9 

296.0 

453.9 

319.7 


200 

Algorithm 1 
SDDP 

m 

358.8 

375.3 

414.3 

360.7 


587.6 

428.8 

653.5 

469.8 

726.0 

525.5 

500 

Algorithm 1 
SDDP 

m 

664.0 

426.5 

828.4 

564.6 

995.4 

651.5 

1183.3 

751.6 

1673.5 

869.8 

2536.0 

1003.2 


Table 1: Computational time per iteration (in seconds) for risk-neutral stochastic optimization meth¬ 
ods. 


Iterations 

1 

50 

100 

150 

200 

250 

300 

50 

Algorithm 2 
MSDDP 

180.0 

181.0 

225.6 

192.7 

239.1 

198.2 

258.2 

206.3 

277.0 

213.3 

294.2 

221.8 

310.1 

228.9 

100 

Algorithm 2 
MSDDP 

256.0 

255.0 

309.5 

255.7 

336.1 

267.2 

371.5 

279.3 

411.2 

300.8 

450.3 

325.1 

495.6 

347.1 

200 

Algorithm 2 
MSDDP 

296.0 

339.0 

364.6 

301.6 

422.2 

319.4 

513.0 

364.9 

592.6 

409.3 

657.4 

454.8 

731.1 

515.1 

500 

Algorithm 2 
MSDDP 

542.0 

338.0 

650.3 

434.7 

799.5 

586.7 

959.4 

674.3 

1151.7 

777.2 

1637.6 

886.8 

2490.4 

1004.1 


Table 2: Computational time per iteration (in seconds) for risk-neutral stochastic optimization meth¬ 
ods. 
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9 Conclusion 


Large scale multistage stochastic optimization problems with long time horizons arise in numerous 
real-world applications in energy, finance, transportation and other fields. The numerical solution 
of such models can be computationally demanding, often causing practioners to face a trade-off 
between solution quality and computational time. 

In our work we have developed regularization methods for the SDDP framework and studied 
their convergence. The algorithms employ regularization terms in the selection of cutting hyper¬ 
planes which improve the quality of the resulting value function approximations. The proposed 
techniques feature straightforward implementation and can be quickly integrated into existing soft¬ 
ware solutions without the need for major additional efforts in development and testing. 

In order to assess the performance of the proposed approach we consider a model for the integra¬ 
tion of renewable energy using distributed grid-level storage into the grid of PJM, one of the largest 
regional transmission operators in the United States. Our numerical experiments indicate that the 
proposed regularized algorithms exhibits significantly faster convergence than their non-regularized 
counterparts, with greater gains observed for higher-dimensional problems. 

In the future we can consider several extensions of the current work. One possible direction 
would involve further investigation into the selection of appropriate regularization terms and coef¬ 
ficients. Another possible path of exploration would be the application of regularization techniques 
for the solution of risk-averse models involving time-consistent compositions of coherent measures 
of risk along the lines of Eli], and El. Additionally, we would also like to extend the pro¬ 
posed approach to the solution of multiobjective stochastic models El- Finally, obtaining further 
empirical results and insights from problems in the field would also be a subject of great interest. 
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